Suicide is the deliberate act of ending one’s own life, often stemming from various mental disorders such as depression, bipolar disorder, autism, schizophrenia, and personality disorders, as well as external stressors like financial struggles, academic pressures, relationship issues, or experiences of harassment and bullying. Factors like substance abuse, including alcoholism and benzodiazepine use, also contribute to this tragic outcome. Previous suicide attempts significantly elevate the risk of future attempts. Efforts to prevent suicide involve a multifaceted approach, including restricting access to common methods like firearms, drugs, and poisons, addressing mental health issues and substance abuse, responsible media reporting on suicides, and fostering better economic conditions. Despite the widespread availability of crisis hotlines, their effectiveness remains inadequately researched. The prevalence and methods of suicide vary across countries, often influenced by the accessibility of lethal means. Hanging, pesticide ingestion, poisoning, and firearms are among the most commonly used methods. Globally, suicides claim over 700,000 lives annually, ranking suicide as the 10th leading cause of death worldwide. Approximately 1.5% of people die by suicide, translating to roughly 12 per 100,000 individuals each year. Men are more likely to die by suicide than women, with rates ranging from 1.5 times higher in developing countries to 3.5 times higher in developed ones. Financial strain often exacerbates the risk of suicide. [1]
Suicide is not confined to high-income countries; it is a global issue affecting all regions. Surprisingly, over 77% of suicides occur in low- and middle-income countries. However, high-income countries exhibit the highest age-standardized suicide rates. In ongoing research across 19 countries, we explore correlations between GDP, bankruptcy rates, happiness levels, and suicide rates. Understanding these intricate relationships is pivotal for policymakers, researchers, and stakeholders to devise effective interventions promoting mental well-being and economic resilience. Leveraging multidimensional datasets and advanced analytical methods, our project aims to illuminate these connections for the betterment of societies worldwide.
# Load libraries
library(dplyr)
library(readr)
library(tidyr)
library(lubridate)
library(ggplot2)
library(stringr)
library(readxl)
library(httr)
# Load dataset
bankruptcies <- read_csv("https://raw.githubusercontent.com/Alexburk93/Data_Wrangling_EDA/main/data/raw_data/Bankruptcies_2011-2020.csv")New names:Rows: 987 Columns: 19── Column specification ──────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: ","
chr (13): COU, Country, VAR, Variable, MEA, Measure, ISIC4...7, ISIC4...8, TIME, Time, Unit Code, Unit, PowerCode
dbl (2): PowerCode Code, Value
lgl (4): Reference Period Code, Reference Period, Flag Codes, Flags
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
spc_tbl_ [987 × 19] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
$ COU : chr [1:987] "CAN" "CAN" "CAN" "CAN" ...
$ Country : chr [1:987] "Canada" "Canada" "Canada" "Canada" ...
$ VAR : chr [1:987] "BANKRUPTCIES" "BANKRUPTCIES" "BANKRUPTCIES" "BANKRUPTCIES" ...
$ Variable : chr [1:987] "Number of bankruptcies" "Number of bankruptcies" "Number of bankruptcies" "Number of bankruptcies" ...
$ MEA : chr [1:987] "INDEX" "INDEX" "INDEX" "INDEX" ...
$ Measure : chr [1:987] "Index 2007=100" "Index 2007=100" "Index 2007=100" "Index 2007=100" ...
$ ISIC4...7 : chr [1:987] "01_99" "01_99" "01_99" "01_99" ...
$ ISIC4...8 : chr [1:987] "Grand Total" "Grand Total" "Grand Total" "Grand Total" ...
$ TIME : chr [1:987] "2011-Q1" "2011-Q2" "2011-Q3" "2011-Q4" ...
$ Time : chr [1:987] "Q1-2011" "Q2-2011" "Q3-2011" "Q4-2011" ...
$ Unit Code : chr [1:987] "IDX" "IDX" "IDX" "IDX" ...
$ Unit : chr [1:987] "Index" "Index" "Index" "Index" ...
$ PowerCode Code : num [1:987] 0 0 0 0 0 0 0 0 0 0 ...
$ PowerCode : chr [1:987] "Units" "Units" "Units" "Units" ...
$ Reference Period Code: logi [1:987] NA NA NA NA NA NA ...
$ Reference Period : logi [1:987] NA NA NA NA NA NA ...
$ Value : num [1:987] 60 58.3 56.3 55.3 54 ...
$ Flag Codes : logi [1:987] NA NA NA NA NA NA ...
$ Flags : logi [1:987] NA NA NA NA NA NA ...
- attr(*, "spec")=
.. cols(
.. COU = col_character(),
.. Country = col_character(),
.. VAR = col_character(),
.. Variable = col_character(),
.. MEA = col_character(),
.. Measure = col_character(),
.. ISIC4...7 = col_character(),
.. ISIC4...8 = col_character(),
.. TIME = col_character(),
.. Time = col_character(),
.. `Unit Code` = col_character(),
.. Unit = col_character(),
.. `PowerCode Code` = col_double(),
.. PowerCode = col_character(),
.. `Reference Period Code` = col_logical(),
.. `Reference Period` = col_logical(),
.. Value = col_double(),
.. `Flag Codes` = col_logical(),
.. Flags = col_logical()
.. )
- attr(*, "problems")=<externalptr>
COU Country VAR Variable MEA Measure
Length:987 Length:987 Length:987 Length:987 Length:987 Length:987
Class :character Class :character Class :character Class :character Class :character Class :character
Mode :character Mode :character Mode :character Mode :character Mode :character Mode :character
ISIC4...7 ISIC4...8 TIME Time Unit Code Unit
Length:987 Length:987 Length:987 Length:987 Length:987 Length:987
Class :character Class :character Class :character Class :character Class :character Class :character
Mode :character Mode :character Mode :character Mode :character Mode :character Mode :character
PowerCode Code PowerCode Reference Period Code Reference Period Value Flag Codes
Min. :0 Length:987 Mode:logical Mode:logical Min. : 32.09 Mode:logical
1st Qu.:0 Class :character NA's:987 NA's:987 1st Qu.: 97.05 NA's:987
Median :0 Mode :character Median :121.18
Mean :0 Mean :139.36
3rd Qu.:0 3rd Qu.:146.24
Max. :0 Max. :949.42
Flags
Mode:logical
NA's:987
Australia Belgium Brazil Canada Denmark Finland France
41 82 36 40 40 80 40
Germany Iceland Italy Japan Netherlands New Zealand Norway
80 40 40 40 80 28 80
South Africa Spain Sweden United Kingdom United States
38 40 82 40 40
[1] 3948
# Removing NA's columns and Time (Duplicate)
data <- select(bankruptcies, -`Flag Codes`, -Flags, -`Reference Period Code`, -`Reference Period`, -TIME)
data %>%
sample_n(10)tibble [987 × 14] (S3: tbl_df/tbl/data.frame)
$ COU : chr [1:987] "CAN" "CAN" "CAN" "CAN" ...
$ Country : chr [1:987] "Canada" "Canada" "Canada" "Canada" ...
$ VAR : chr [1:987] "BANKRUPTCIES" "BANKRUPTCIES" "BANKRUPTCIES" "BANKRUPTCIES" ...
$ Variable : chr [1:987] "Number of bankruptcies" "Number of bankruptcies" "Number of bankruptcies" "Number of bankruptcies" ...
$ MEA : chr [1:987] "INDEX" "INDEX" "INDEX" "INDEX" ...
$ Measure : chr [1:987] "Index 2007=100" "Index 2007=100" "Index 2007=100" "Index 2007=100" ...
$ ISIC4...7 : chr [1:987] "01_99" "01_99" "01_99" "01_99" ...
$ ISIC4...8 : chr [1:987] "Grand Total" "Grand Total" "Grand Total" "Grand Total" ...
$ Time : chr [1:987] "Q1-2011" "Q2-2011" "Q3-2011" "Q4-2011" ...
$ Unit Code : chr [1:987] "IDX" "IDX" "IDX" "IDX" ...
$ Unit : chr [1:987] "Index" "Index" "Index" "Index" ...
$ PowerCode Code: num [1:987] 0 0 0 0 0 0 0 0 0 0 ...
$ PowerCode : chr [1:987] "Units" "Units" "Units" "Units" ...
$ Value : num [1:987] 60 58.3 56.3 55.3 54 ...
[1] 987 14
For this project, we are only focusing on information from 2011 to 2020. And to minimize the workload, we are analyzing 19 countries.
# Create new columns 'Quarter' and 'Year' from 'Time'
new_data <- mutate(data,
Quarter = str_sub(Time, 1, 2), # Extract Quarter
Year = as.numeric(str_sub(Time, 4))) # Extract Year and convert to numeric
# remove the original 'Time' column
new_data <- select(new_data, -Time)
new_data
# Get date range
date_range <- range(new_data$Year)
# Count unique countries
n_countries <- length(unique(new_data$Country))
# Print the results
n_countries[1] 19
[1] 2011 2021
# Only need data from 2011-2020, excluding 2021
bankruptcy_data <- filter(new_data, Year != 2021)
new_data
# Get date range of new updated dataset
date_range <- range(bankruptcy_data$Year)
# Extract start and end years from the range
start_year <- date_range[1]
end_year <- date_range[2]
# Count unique countries
n_countries <- length(unique(bankruptcy_data$Country))
# Print the results
cat("Number of country: ", n_countries, "\n") Number of country: 19
Date range from: 2011 to 2020
# Load dataset
my_data <- read_csv("https://raw.githubusercontent.com/Alexburk93/Data_Wrangling_EDA/main/data/raw_data/GDP_Data/GDP_capita_1960_2022.csv")Rows: 266 Columns: 10── Column specification ──────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: ","
chr (1): Country Name
dbl (9): 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
The Years are in the header, using pivot_longer() to
convert into two column Year and GDP
# Pivot the data from wide to long format
joy <- pivot_longer(my_data,
cols = -c(`Country Name`),
names_to = "Year",
values_to = "GDP")
# Show the first few rowsPerfom summary stastics, check for NA's,
dimension dim(), etc.
tibble [2,394 × 3] (S3: tbl_df/tbl/data.frame)
$ Country Name: chr [1:2394] "Aruba" "Aruba" "Aruba" "Aruba" ...
$ Year : chr [1:2394] "2011" "2012" "2013" "2014" ...
$ GDP : num [1:2394] 26043 25611 26515 26940 28419 ...
Country Name Year GDP
Length:2394 Length:2394 Min. : 217
Class :character Class :character 1st Qu.: 2099
Mode :character Mode :character Median : 6570
Mean : 16591
3rd Qu.: 19866
Max. :199383
NA's :66
Warning: Unknown or uninitialised column: `Indicator Code`.
< table of extent 0 >
[1] 66
[1] 2394 3
Filter the data to only show the 19 countries and the date range the project is focused on.
# Define the list of 19 countries
countries_of_interest <- c("Australia", "Belgium", "Brazil", "Canada", "Denmark",
"Finland", "France", "Germany", "Iceland", "Italy",
"Japan", "Netherlands", "New Zealand", "Norway",
"South Africa", "Spain", "Sweden", "United Kingdom",
"United States")
# Filter the data for the 19 countries
gdp <- joy %>%
filter(`Country Name` %in% countries_of_interest)
# Define the date range of focus
start_year <- 2011
end_year <- 2020
# Filter the data for the date range
gdp <- gdp %>%
filter(as.numeric(Year) >= start_year & as.numeric(Year) <= end_year)
# Show the filtered data
gdp %>%
sample_n(10)tibble [171 × 3] (S3: tbl_df/tbl/data.frame)
$ Country Name: chr [1:171] "Australia" "Australia" "Australia" "Australia" ...
$ Year : chr [1:171] "2011" "2012" "2013" "2014" ...
$ GDP : num [1:171] 62610 68078 68198 62558 56759 ...
Country Name Year GDP
Length:171 Length:171 Min. : 5735
Class :character Class :character 1st Qu.: 38808
Mode :character Mode :character Median : 46299
Mean : 45324
3rd Qu.: 53541
Max. :103554
Warning: Unknown or uninitialised column: `Indicator Code`.
< table of extent 0 >
[1] 0
[1] 171 3
# Group the filtered data by Year and calculate the mean GDP for each year
mean_gdp_by_year <- gdp %>%
group_by(Year) %>%
summarize(mean_GDP = mean(GDP, na.rm = TRUE))
# Show the mean GDP for each year
print(mean_gdp_by_year)Visualization for top 10 highest GDP by Counries Over Time.
# Filter to include only the top 10 countries with the highest GDP values
top_10_gdp <- gdp %>%
group_by(`Country Name`) %>%
summarize(total_gdp = sum(GDP, na.rm = TRUE)) %>%
top_n(10, total_gdp) %>%
left_join(gdp, by = "Country Name")
# Convert GDP values to millions or billions
top_10_gdp <- top_10_gdp %>%
mutate(GDP_formatted = case_when(
GDP >= 1e12 ~ paste0(round(GDP / 1e12, 1), "T"), # Convert to trillions
GDP >= 1e9 ~ paste0(round(GDP / 1e9, 1), "B"), # Convert to billions
GDP >= 1e6 ~ paste0(round(GDP / 1e6, 1), "M"), # Convert to millions
TRUE ~ as.character(GDP) # Keep unchanged if less than 1 million
))
# Create a line plot of GDP over time with formatted values for the top 10 countries
ggplot(top_10_gdp, aes(x = Year, y = GDP, group = `Country Name`)) +
geom_line(aes(color = `Country Name`)) +
labs(title = "GDP Trends Over Time (Top 10 Countries)",
x = "Year",
y = "GDP",
color = "Country") +
scale_y_continuous(labels = function(x) paste0(x, "")) + # Ensure y-axis labels are character type
theme_minimal()Note that the echo = FALSE parameter was added to the
code chunk to prevent printing of the R code that generated the
plot.
# Load data set from github
suicide_df = read_csv("https://raw.githubusercontent.com/Alexburk93/Data_Wrangling_EDA/main/data/raw_data/death-rate-from-suicides-gho%20new.csv")Rows: 3876 Columns: 4── Column specification ──────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: ","
chr (3): Entity, Code, Age-standardized suicide rate - Sex: both sexes
dbl (1): Year
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
suicide_df$`Age-standardized suicide rate - Sex: both sexes` = as.double(suicide_df$`Age-standardized suicide rate - Sex: both sexes`)Warning: NAs introduced by coercion
Entity Code Year Age-standardized suicide rate - Sex: both sexes
Length:3876 Length:3876 Min. :2000 Min. : 0.000
Class :character Class :character 1st Qu.:2004 1st Qu.: 5.955
Mode :character Mode :character Median :2009 Median : 10.015
Mean :2009 Mean : 68.421
3rd Qu.:2014 3rd Qu.:141.230
Max. :2019 Max. :962.889
NA's :5
# Consider removing Lesotho from the analysis
suicide_df %>%
filter(is.na(`Age-standardized suicide rate - Sex: both sexes`))# Remove the Region / Income classes and Lesotho
suicide_df_countries = suicide_df %>%
filter(Code != "0", Code != "LSO")avg_country = suicide_df_countries %>%
group_by(Entity) %>%
summarise(avg_per_country = mean(`Age-standardized suicide rate - Sex: both sexes`)) %>%
arrange(desc(avg_per_country))ggplot(avg_country, aes(x = avg_per_country)) +
geom_histogram() +
labs(x = "Mean Suicide rate", y = "Frequency", title = "Distribution of mean of suicide rate") +
theme_minimal()# Use the raw file URL
url <- "https://github.com/Alexburk93/Data_Wrangling_EDA/raw/main/data/raw_data/WHR20_DataForTable2.1.xls"
response <- GET(url)
content <- content(response, "raw")
temp <- tempfile(fileext = ".xls")
writeBin(content, temp)
happiness_df <- read_excel(temp)
# View the data
head(happiness_df)# create data frame with selected columns. Based on the description of the different variables.
happiness_df_filtered = happiness_df %>%
select(`Country name`, `year` , `Life Ladder`, `Social support`, `Healthy life expectancy at birth`, `Freedom to make life choices`, `Perceptions of corruption`)
summary(happiness_df_filtered) Country name year Life Ladder Social support Healthy life expectancy at birth
Length:1848 Min. :2005 Min. :2.375 Min. :0.2902 Min. :32.30
Class :character 1st Qu.:2010 1st Qu.:4.623 1st Qu.:0.7483 1st Qu.:58.30
Mode :character Median :2013 Median :5.363 Median :0.8340 Median :65.10
Mean :2013 Mean :5.446 Mean :0.8111 Mean :63.17
3rd Qu.:2016 3rd Qu.:6.268 3rd Qu.:0.9046 3rd Qu.:68.39
Max. :2019 Max. :8.019 Max. :0.9873 Max. :77.10
NA's :13 NA's :52
Freedom to make life choices Perceptions of corruption
Min. :0.2575 Min. :0.0352
1st Qu.:0.6431 1st Qu.:0.6927
Median :0.7575 Median :0.8036
Mean :0.7385 Mean :0.7491
3rd Qu.:0.8524 3rd Qu.:0.8737
Max. :0.9852 Max. :0.9833
NA's :31 NA's :103
avg_happiness_per_country = happiness_df_filtered %>%
group_by(`Country name`) %>%
summarise(avg_happiness = mean(`Life Ladder`)) %>%
arrange(desc(avg_happiness))ggplot(avg_happiness_per_country, aes(x = avg_happiness)) +
geom_histogram() +
labs(x = "Mean Happiness Level", y = "Frequency", title = "Distribution of mean of happiness rate") +
theme_minimal()Happiness Data for the 19 countries of interest from 2011 - 2019
# Define the list of 19 countries
countries_of_interest <- c("Australia", "Belgium", "Brazil", "Canada", "Denmark",
"Finland", "France", "Germany", "Iceland", "Italy",
"Japan", "Netherlands", "New Zealand", "Norway",
"South Africa", "Spain", "Sweden", "United Kingdom",
"United States")
# Filter the data for the 19 countries
happiness_new<- happiness_df_filtered %>%
filter(`Country name` %in% countries_of_interest)
# Define the date range of focus
start_year <- 2011
end_year <- 2020
# Filter the data for the date range
happiness_data <- happiness_new %>%
filter(as.numeric(year) >= start_year & as.numeric(year) <= end_year)
# Show the filtered data
happiness_data %>%
sample_n(10)# Check if all countries of interest are present in the filtered data
missing_countries <- setdiff(countries_of_interest, unique(happiness_data$`Country name`))
# Print the missing countries, if any
if(length(missing_countries) > 0) {
print("The following countries are missing from the filtered data:")
print(missing_countries)
} else {
print("All countries of interest are selected from the filtered data.")
}[1] "All countries of interest are selected from the filtered data."
# Count the occurrences of each country in the filtered data
country_counts <- table(happiness_data$`Country name`)
# Print the country names and their counts
print("Country Name\t\tCount")[1] "Country Name\t\tCount"
Australia 9
Belgium 9
Brazil 9
Canada 9
Denmark 9
Finland 9
France 9
Germany 9
Iceland 6
Italy 9
Japan 9
Netherlands 9
New Zealand 9
Norway 7
South Africa 9
Spain 9
Sweden 9
United Kingdom 9
United States 9
Suicide Data for the 19 countries of interest from 2011 - 2019
# Change column name from `Entity` to `country name`
suicide_df <- rename(suicide_df, `Country name` = Entity)
# Define the list of 19 countries
countries_of_interest <- c("Australia", "Belgium", "Brazil", "Canada", "Denmark",
"Finland", "France", "Germany", "Iceland", "Italy",
"Japan", "Netherlands", "New Zealand", "Norway",
"South Africa", "Spain", "Sweden", "United Kingdom",
"United States")
# Filter the data for the 19 countries
new_suicide<- suicide_df %>%
filter(`Country name` %in% countries_of_interest)
# Define the date range of focus
start_year <- 2011
end_year <- 2020
# Filter the data for the date range
suicide_data <- new_suicide %>%
filter(as.numeric(Year) >= start_year & as.numeric(Year) <= end_year)
# Show the filtered data
suicide_data %>%
sample_n(10)# Change column name from `Entity` to `country name`
gdp_data <- rename(gdp, `Country name` = `Country Name`)
bankruptcy_data<- rename(bankruptcy_data, `Country name` = `Country`)
happiness_data = rename(happiness_data, `Year` = `year`)
suicide_data %>%
sample_n(10)# Convert "Year" column to character type in all datasets
suicide_data <- mutate(suicide_data, Year = as.character(Year))
happiness_data <- mutate(happiness_data, Year = as.character(Year))
gdp_data <- mutate(gdp_data, Year = as.character(Year))
bankruptcy_data <- mutate(bankruptcy_data, Year = as.character(Year))
# Merge the datasets
suicide_analysis <- suicide_data %>%
left_join(happiness_data, by = c("Year", "Country name")) %>%
left_join(gdp_data, by = c("Year", "Country name")) %>%
left_join(bankruptcy_data, by = c("Year", "Country name"))Save the dataset in excel csv format to run analysis for the final project
# # Drop specified columns
# suicide_analysis <- suicide_analysis %>%
# select(-c("Code", "COU", "PowerCode", "PowerCode Code"))
#
# # Save the merged dataset as a CSV file
# # Define the path to the folder on your desktop
# desktop_path <- "/home/alex/Uni/Master_US/2_Semester/Class_Data_Wrangeling_EDA/Data_Wrangling_EDA/data/"
#
# # Create the folder if it doesn't exist
# dir.create(desktop_path, showWarnings = FALSE)
#
# # Save the merged dataset as a CSV file in the specified folder
# write.csv(suicide_analysis, file.path(desktop_path, "suicide_analysis_2.csv"), row.names = FALSE)data <- read_csv("https://raw.githubusercontent.com/Alexburk93/Data_Wrangling_EDA/main/data/suicide_analysis_2.csv")New names:Rows: 894 Columns: 19── Column specification ──────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: ","
chr (10): Country name, VAR, Variable, MEA, Measure, ISIC4...14, ISIC4...15, Unit Code, Unit, Quarter
dbl (9): Year, Age-standardized suicide rate - Sex: both sexes, Life Ladder, Social support, Healthy life expecta...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# renaming the columns
data <- data %>%
rename(`Country_name` = `Country name`,
`Suicide_Rate` = `Age-standardized suicide rate - Sex: both sexes`,
`Life_ladder` = `Life Ladder`,
`Social_support` = `Social support`,
`Life_expectancy` = `Healthy life expectancy at birth`,
`Freedom_choices` = `Freedom to make life choices`,
`Corruption` = `Perceptions of corruption`)# drop columns
# remove the original 'Time' column
data <- select(data, -Variable, -VAR, -MEA, -`Unit Code`)
data[1] 894
[1] 15
[1] 894 15
tibble [894 × 15] (S3: tbl_df/tbl/data.frame)
$ Country_name : chr [1:894] "Australia" "Australia" "Australia" "Australia" ...
$ Year : num [1:894] 2011 2011 2011 2011 2011 ...
$ Suicide_Rate : num [1:894] 10.1 10.1 10.1 10.1 11 ...
$ Life_ladder : num [1:894] 7.41 7.41 7.41 7.41 7.19 ...
$ Social_support : num [1:894] 0.967 0.967 0.967 0.967 0.954 ...
$ Life_expectancy: num [1:894] 72.3 72.3 72.3 72.3 72.1 ...
$ Freedom_choices: num [1:894] 0.945 0.945 0.945 0.945 0.935 ...
$ Corruption : num [1:894] 0.382 0.382 0.382 0.382 0.269 ...
$ GDP : num [1:894] 62610 62610 62610 62610 38388 ...
$ Measure : chr [1:894] "Index 2007=100" "Index 2007=100" "Index 2007=100" "Index 2007=100" ...
$ ISIC4...14 : chr [1:894] "01_99C" "01_99C" "01_99C" "01_99C" ...
$ ISIC4...15 : chr [1:894] "Grand total (corporations only)" "Grand total (corporations only)" "Grand total (corporations only)" "Grand total (corporations only)" ...
$ Unit : chr [1:894] "Index" "Index" "Index" "Index" ...
$ Value : num [1:894] 132 138 143 145 122 ...
$ Quarter : chr [1:894] "Q1" "Q2" "Q3" "Q4" ...
2011 2012 2013 2014 2015 2016 2017 2018 2019
100 100 100 100 100 100 100 97 97
[1] 19
[1] "Australia" "New Zealand" "United States" "Spain" "Netherlands" "France"
[7] "Finland" "Belgium" "Japan" "South Africa" "Iceland" "Norway"
[13] "Sweden" "Italy" "Brazil" "United Kingdom" "Germany" "Canada"
[19] "Denmark"
[1] 2011 2012 2013 2014 2015 2016 2017 2018 2019
# Set highcharter options for tooltip decimals
options(highcharter.tooltip.valueDecimals = 2)
# Create highcharter map visualization
hc <- highchart() %>%
hc_add_series_map(
worldgeojson, data, value = "GDP",
joinBy = c('name', 'Country_name'),
name = "GDP (current US$)"
) %>%
hc_colorAxis(stops = color_stops()) %>%
hc_title(text = "World Map") %>%
hc_subtitle(text = "GDP in current US$")
hc# Set highcharter options for tooltip decimals
options(highcharter.tooltip.valueDecimals = 2)
# Create map visualizations for each variable
hc_life_expectancy <- highchart() %>%
hc_add_series_map(
worldgeojson, data,
value = "Life_expectancy",
joinBy = c('name', 'Country_name'),
name = "Life Expectancy"
) %>%
hc_colorAxis(stops = color_stops()) %>%
hc_title(text = "World Map") %>%
hc_subtitle(text = "Life Expectancy")
hc_suicide_rates <- highchart() %>%
hc_add_series_map(
worldgeojson, data,
value = "Suicide_Rate",
joinBy = c('name', 'Country_name'),
name = "Suicide Rates"
) %>%
hc_colorAxis(stops = color_stops()) %>%
hc_title(text = "World Map") %>%
hc_subtitle(text = "Suicide Rate")
hc_corruption <- highchart() %>%
hc_add_series_map(
worldgeojson, data,
value = "Corruption",
joinBy = c('name', 'Country_name'),
name = "Corruption"
) %>%
hc_colorAxis(stops = color_stops()) %>%
hc_title(text = "World Map") %>%
hc_subtitle(text = "Corruption")
# Display the map visualizations
list(hc_life_expectancy, hc_suicide_rates, hc_corruption)[[1]]
[[2]]
[[3]]
NA
avg_gdp_per_year <- data %>%
group_by (`Year`) %>%
summarise(avg_gpd = mean(`GDP`))
avg_gdp_per_yearggplot(avg_gdp_per_year, aes(x = Year, y = avg_gpd)) +
geom_line(color = "blue") +
labs(title = "Average GDP Over Time worldwide",
x = "Year",
y = "GDP") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
scale_x_continuous(breaks = seq(min(avg_gdp_per_year$Year), max(avg_gdp_per_year$Year), by = 1))avg_happiness_per_year <- data %>%
group_by (`Year`) %>%
summarise(avg_happinnes = mean(`Life_ladder`, na.rm = T))
avg_happiness_per_yearggplot(avg_happiness_per_year, aes(x = Year, y = avg_happinnes)) +
geom_line(color = "blue") +
labs(title = "Average Happiness Over Time worldwide",
x = "Year",
y = "Happiness") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
scale_x_continuous(breaks = seq(min(avg_happiness_per_year$Year), max(avg_happiness_per_year$Year), by = 1))avg_Suicide_Rate_per_year <- data %>%
group_by (`Year`) %>%
summarise(avg_Suicide_Rate = mean(`Suicide_Rate`, na.rm = T))
avg_Suicide_Rate_per_yearggplot(avg_Suicide_Rate_per_year, aes(x = Year, y = avg_Suicide_Rate)) +
geom_line(color = "blue") +
labs(title = "Average Suicide Rate Over Time worldwide",
x = "Year",
y = "Suicide Rate") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
scale_x_continuous(breaks = seq(min(avg_Suicide_Rate_per_year$Year), max(avg_Suicide_Rate_per_year$Year), by = 1))avg_Bankruptcies_per_year <- data %>%
group_by (`Year`) %>%
summarise(avg_Bankruptcies = mean(`Value`, na.rm = T))
avg_Bankruptcies_per_yearggplot(avg_Bankruptcies_per_year, aes(x = Year, y = avg_Bankruptcies)) +
geom_line(color = "blue") +
labs(title = "Average Bankruptcies Over Time worldwide",
x = "Year",
y = "Bankruptcies") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
scale_x_continuous(breaks = seq(min(avg_Bankruptcies_per_year$Year), max(avg_Bankruptcies_per_year$Year), by = 1))# Finding the ratio for scaling the second axis
ratio <- max(avg_gdp_per_year$avg_gpd) / max(avg_Suicide_Rate_per_year$avg_Suicide_Rate)
# Creating the base plot
ggplot() +
# Adding the bar plot for GDP
geom_line(data = avg_gdp_per_year, aes(x = Year, y = avg_gpd), size = 0.5) +
# Adding the line plot for Average Happiness
geom_line(data = avg_Suicide_Rate_per_year, aes(x = Year, y = avg_Suicide_Rate * ratio), color = "red", size = 0.5) +
# Enhancing the plot
labs(title = "Average GDP and Suicide Rate Over Time",
x = "Year",
y = "Average GDP") +
scale_y_continuous(sec.axis = sec_axis(~ . / ratio, name = "Average Suicide Rate")) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
scale_x_continuous(breaks = seq(min(c(avg_gdp_per_year$Year, avg_Suicide_Rate_per_year$Year)),
max(c(avg_gdp_per_year$Year, avg_Suicide_Rate_per_year$Year)), by = 1))# Finding the ratio for scaling the second axis
ratio <- max(avg_happiness_per_year$avg_happinnes) / max(avg_Suicide_Rate_per_year$avg_Suicide_Rate)
# Creating the base plot
ggplot() +
# Adding the bar plot for GDP
geom_line(data = avg_happiness_per_year, aes(x = Year, y = avg_happinnes), size = 0.5) +
# Adding the line plot for Average Happiness
geom_line(data = avg_Suicide_Rate_per_year, aes(x = Year, y = avg_Suicide_Rate * ratio), color = "red", size = 0.5) +
# Enhancing the plot
labs(title = "Average Happinness and Suicide Rate Over Time",
x = "Year",
y = "Average Happinness") +
scale_y_continuous(sec.axis = sec_axis(~ . / ratio, name = "Average Suicide Rate")) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
scale_x_continuous(breaks = seq(min(c(avg_happiness_per_year$Year, avg_Suicide_Rate_per_year$Year)),
max(c(avg_happiness_per_year$Year, avg_Suicide_Rate_per_year$Year)), by = 1))# Finding the ratio for scaling the second axis
ratio <- max(avg_happiness_per_year$avg_happinnes) / max(avg_Suicide_Rate_per_year$avg_Suicide_Rate)
# Creating the base plot
ggplot() +
# Adding the line plot for Average Happiness
geom_line(data = avg_happiness_per_year, aes(x = Year, y = avg_happinnes), size = 0.5) +
# Adding the line plot for Suicide Rate adjusted by the ratio
geom_line(data = avg_Suicide_Rate_per_year, aes(x = Year, y = avg_Suicide_Rate * ratio), color = "red", size = 0.5) +
# Setting up the titles and labels
labs(title = "Average Happiness and Suicide Rate Over Time",
x = "Year",
y = "Average Happiness",
subtitle = "Suicide rates are scaled to compare against happiness") +
# Primary axis for Happiness, secondary axis for Suicide Rate (inversed scaling)
scale_y_continuous(name = "Average Happiness",
sec.axis = sec_axis(~ . / ratio, name = "Average Suicide Rate")) +
# Minimalist theme with angled x-axis texts
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
# Set x-axis breaks
scale_x_continuous(breaks = seq(min(c(avg_happiness_per_year$Year, avg_Suicide_Rate_per_year$Year)),
max(c(avg_happiness_per_year$Year, avg_Suicide_Rate_per_year$Year)), by = 1))# Finding the ratio for scaling the second axis
ratio <- max(avg_Bankruptcies_per_year$avg_Bankruptcies) / max(avg_Suicide_Rate_per_year$avg_Suicide_Rate)
# Creating the base plot
ggplot() +
# Adding the bar plot for GDP
geom_line(data = avg_Bankruptcies_per_year, aes(x = Year, y = avg_Bankruptcies), size = 0.5) +
# Adding the line plot for Average Happiness
geom_line(data = avg_Suicide_Rate_per_year, aes(x = Year, y = avg_Suicide_Rate * ratio), color = "red", size = 0.5) +
# Enhancing the plot
labs(title = "Average Bankruptcies and Suicide Rate Over Time",
x = "Year",
y = "Average Bankruptcies") +
scale_y_continuous(sec.axis = sec_axis(~ . / ratio, name = "Average Suicide Rate")) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
scale_x_continuous(breaks = seq(min(c(avg_Bankruptcies_per_year$Year, avg_Suicide_Rate_per_year$Year)),
max(c(avg_Bankruptcies_per_year$Year, avg_Suicide_Rate_per_year$Year)), by = 1))avg_Suicide_Rate_per_country = data %>%
group_by(Country_name) %>%
summarise(avg_suicide_rate = mean(Suicide_Rate, na.rm = TRUE)) %>%
arrange(avg_suicide_rate) %>%
mutate(Row_Number = row_number())
avg_Suicide_Rate_per_country
avg_happiness_per_country <- data %>%
group_by(Country_name) %>%
summarise(avg_happiness = mean(Life_ladder, na.rm = TRUE)) %>%
arrange(desc(avg_happiness))
least_happy = tail(avg_happiness_per_country, 2)
most_happy = head(avg_happiness_per_country, 2)
avg_Suicide_Rate_per_country %>%
filter(Country_name %in% least_happy$Country_name)
# Interpretation: Japan and South Africa are two very unhappy countries. And they also have a high suicide rate
avg_Suicide_Rate_per_country %>%
filter(Country_name %in% most_happy$Country_name)
# Interpretation: Finland is a the second most happy country. But is still on place 16/19 when it comes to suicidesavg_gdp_per_country <- data %>%
group_by (`Country_name`) %>%
summarise(avg_gpd = mean(`GDP`)) %>%
arrange(desc(avg_gpd))
least_gdp = tail(avg_gdp_per_country, 2)
most_gdp = head(avg_gdp_per_country, 2)
avg_Suicide_Rate_per_country %>%
filter(Country_name %in% least_gdp$Country_name)
# Interpretation: Brazil with a low GPD per Capita is still low in the suicide ranking. SA has a bad GPA per Capita and is also bad in the suicide ranking
avg_Suicide_Rate_per_country %>%
filter(Country_name %in% most_gdp$Country_name)avg_Bankruptcies_per_year <- data %>%
group_by (`Country_name`) %>%
summarise(avg_bankruptcies = mean(`Value`, na.rm = T)) %>%
arrange(desc(avg_bankruptcies))
least_bank = tail(avg_Bankruptcies_per_year, 2)
most_bank = head(avg_Bankruptcies_per_year, 2)
avg_Suicide_Rate_per_country %>%
filter(Country_name %in% least_bank$Country_name)
# Interpretation: Bankruptcies don't have an influence on suicide rates
avg_Suicide_Rate_per_country %>%
filter(Country_name %in% most_bank$Country_name)# Plot Germany GDP over Years
avg_gdp_year_germany = germany_data %>%
group_by(Year) %>%
summarise(avg_gdp = mean(GDP))
ggplot(avg_gdp_year_germany, aes(x = Year, y = avg_gdp)) +
geom_line(color = "blue") +
labs(title = "Average GPD Over Time - Germany",
x = "Year",
y = "GDP") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
scale_x_continuous(breaks = seq(min(avg_gdp_year_germany$Year), max(avg_gdp_year_germany$Year), by = 1))avg_suicide_year_germany = germany_data %>%
group_by(Year) %>%
summarise(avg_suicide = mean(Suicide_Rate))
ggplot(avg_suicide_year_germany, aes(x = Year, y = avg_suicide)) +
geom_line(color = "blue") +
labs(title = "Average Suicide Rate Over Time - Germany",
x = "Year",
y = "Suicide Rate") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
scale_x_continuous(breaks = seq(min(avg_suicide_year_germany$Year), max(avg_suicide_year_germany$Year), by = 1))avg_bank_year_germany = germany_data %>%
group_by(Year) %>%
summarise(avg_bank = mean(Value))
ggplot(avg_bank_year_germany, aes(x = Year, y = avg_bank)) +
geom_line(color = "blue") +
labs(title = "Average bankruptcies Over Time - Germany",
x = "Year",
y = "Bankruptcies") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
scale_x_continuous(breaks = seq(min(avg_bank_year_germany$Year), max(avg_bank_year_germany$Year), by = 1))avg_happiness_year_germany = germany_data %>%
group_by(Year) %>%
summarise(avg_happy = mean(Life_ladder))
ggplot(avg_happiness_year_germany, aes(x = Year, y = avg_happy)) +
geom_line(color = "blue") +
labs(title = "Average Happiness Over Time - Germany",
x = "Year",
y = "Happiness") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
scale_x_continuous(breaks = seq(min(avg_happiness_year_germany$Year), max(avg_happiness_year_germany$Year), by = 1))[1] 9.25 8.98 9.22 9.19 9.00 8.82 8.34 8.46 8.27
# Finding the ratio for scaling the second axis
ratio <- max(avg_gdp_year_germany$avg_gdp) / max(avg_suicide_year_germany$avg_suicide)
# Creating the base plot
ggplot() +
# Adding the bar plot for GDP
geom_line(data = avg_gdp_year_germany, aes(x = Year, y = avg_gdp), size = 0.5) +
# Adding the line plot for Average Happiness
geom_line(data = avg_suicide_year_germany, aes(x = Year, y = avg_suicide * ratio), color = "red", size = 0.5) +
# Enhancing the plot
labs(title = "Average GDP and Suicide Rate Over Time",
x = "Year",
y = "Average GDP") +
scale_y_continuous(sec.axis = sec_axis(~ . / ratio, name = "Average Suicide Rate")) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
scale_x_continuous(breaks = seq(min(c(avg_gdp_year_germany$Year, avg_suicide_year_germany$Year)),
max(c(avg_gdp_year_germany$Year, avg_suicide_year_germany$Year)), by = 1))# Finding the ratio for scaling the second axis
ratio <- max(avg_happiness_year_germany$avg_happy) / max(avg_suicide_year_germany$avg_suicide)
# Creating the base plot
ggplot() +
# Adding the bar plot for GDP
geom_line(data = avg_happiness_year_germany, aes(x = Year, y = avg_happy), size = 0.5) +
# Adding the line plot for Average Happiness
geom_line(data = avg_suicide_year_germany, aes(x = Year, y = avg_suicide * ratio), color = "red", size = 0.5) +
# Enhancing the plot
labs(title = "Average Happinness and Suicide Rate Over Time",
x = "Year",
y = "Average Happinness") +
scale_y_continuous(sec.axis = sec_axis(~ . / ratio, name = "Average Suicide Rate")) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
scale_x_continuous(breaks = seq(min(c(avg_happiness_year_germany$Year, avg_suicide_year_germany$Year)),
max(c(avg_happiness_year_germany$Year, avg_suicide_year_germany$Year)), by = 1))# Finding the ratio for scaling the second axis
ratio <- max(avg_bank_year_germany$avg_bank) / max(avg_suicide_year_germany$avg_suicide)
# Creating the base plot
ggplot() +
# Adding the bar plot for GDP
geom_line(data = avg_bank_year_germany, aes(x = Year, y = avg_bank), size = 0.5) +
# Adding the line plot for Average Happiness
geom_line(data = avg_suicide_year_germany, aes(x = Year, y = avg_suicide * ratio), color = "red", size = 0.5) +
# Enhancing the plot
labs(title = "Average Bankruptcies and Suicide Rate Over Time",
x = "Year",
y = "Average Bankruptcies") +
scale_y_continuous(sec.axis = sec_axis(~ . / ratio, name = "Average Suicide Rate")) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
scale_x_continuous(breaks = seq(min(c(avg_bank_year_germany$Year, avg_suicide_year_germany$Year)),
max(c(avg_bank_year_germany$Year, avg_suicide_year_germany$Year)), by = 1))# Plot SA GDP over Years
avg_gdp_year_SA = SA_data %>%
group_by(Year) %>%
summarise(avg_gdp = mean(GDP))
ggplot(avg_gdp_year_SA, aes(x = Year, y = avg_gdp)) +
geom_line(color = "blue") +
labs(title = "Average GPD Over Time - SA",
x = "Year",
y = "GDP") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
scale_x_continuous(breaks = seq(min(avg_gdp_year_SA$Year), max(avg_gdp_year_SA$Year), by = 1))avg_suicide_year_SA = SA_data %>%
group_by(Year) %>%
summarise(avg_suicide = mean(Suicide_Rate))
ggplot(avg_suicide_year_SA, aes(x = Year, y = avg_suicide)) +
geom_line(color = "blue") +
labs(title = "Average Suicide Rate Over Time - SA",
x = "Year",
y = "Suicide Rate") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
scale_x_continuous(breaks = seq(min(avg_suicide_year_SA$Year), max(avg_suicide_year_SA$Year), by = 1))avg_happiness_year_SA = SA_data %>%
group_by(Year) %>%
summarise(avg_happy = mean(Life_ladder))
ggplot(avg_happiness_year_SA, aes(x = Year, y = avg_happy)) +
geom_line(color = "blue") +
labs(title = "Average Happiness Over Time - SA",
x = "Year",
y = "Happiness") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
scale_x_continuous(breaks = seq(min(avg_happiness_year_SA$Year), max(avg_happiness_year_SA$Year), by = 1))avg_bank_year_sa = SA_data %>%
group_by(Year) %>%
summarise(avg_bank = mean(Value))
ggplot(avg_bank_year_sa, aes(x = Year, y = avg_bank)) +
geom_line(color = "blue") +
labs(title = "Average bankruptcies Over Time - SA",
x = "Year",
y = "Bankruptcies") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
scale_x_continuous(breaks = seq(min(avg_bank_year_sa$Year), max(avg_bank_year_sa$Year), by = 1))# Finding the ratio for scaling the second axis
ratio <- max(avg_gdp_year_SA$avg_gdp) / max(avg_suicide_year_SA$avg_suicide)
# Creating the base plot
ggplot() +
# Adding the bar plot for GDP
geom_line(data = avg_gdp_year_SA, aes(x = Year, y = avg_gdp), size = 0.5) +
# Adding the line plot for Average Happiness
geom_line(data = avg_suicide_year_SA, aes(x = Year, y = avg_suicide * ratio), color = "red", size = 0.5) +
# Enhancing the plot
labs(title = "Average GDP and Suicide Rate Over Time - SA",
x = "Year",
y = "Average GDP") +
scale_y_continuous(sec.axis = sec_axis(~ . / ratio, name = "Average Suicide Rate")) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
scale_x_continuous(breaks = seq(min(c(avg_gdp_year_SA$Year, avg_suicide_year_SA$Year)),
max(c(avg_gdp_year_SA$Year, avg_suicide_year_SA$Year)), by = 1))# Finding the ratio for scaling the second axis
ratio <- max(avg_bank_year_sa$avg_bank) / max(avg_suicide_year_SA$avg_suicide)
# Creating the base plot
ggplot() +
# Adding the bar plot for GDP
geom_line(data = avg_bank_year_sa, aes(x = Year, y = avg_bank), size = 0.5) +
# Adding the line plot for Average Happiness
geom_line(data = avg_suicide_year_SA, aes(x = Year, y = avg_suicide * ratio), color = "red", size = 0.5) +
# Enhancing the plot
labs(title = "Average Bankruptcies and Suicide Rate Over Time - SA",
x = "Year",
y = "Average Bankruptcies") +
scale_y_continuous(sec.axis = sec_axis(~ . / ratio, name = "Average Suicide Rate")) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
scale_x_continuous(breaks = seq(min(c(avg_bank_year_sa$Year, avg_suicide_year_SA$Year)),
max(c(avg_bank_year_sa$Year, avg_suicide_year_SA$Year)), by = 1))# Finding the ratio for scaling the second axis
ratio <- max(avg_happiness_year_SA$avg_happy) / max(avg_suicide_year_SA$avg_suicide)
# Creating the base plot
ggplot() +
# Adding the bar plot for GDP
geom_line(data = avg_happiness_year_SA, aes(x = Year, y = avg_happy), size = 0.5) +
# Adding the line plot for Average Happiness
geom_line(data = avg_suicide_year_SA, aes(x = Year, y = avg_suicide * ratio), color = "red", size = 0.5) +
# Enhancing the plot
labs(title = "Average Happinness and Suicide Rate Over Time - SA",
x = "Year",
y = "Average Happinness") +
scale_y_continuous(sec.axis = sec_axis(~ . / ratio, name = "Average Suicide Rate")) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
scale_x_continuous(breaks = seq(min(c(avg_happiness_year_SA$Year, avg_suicide_year_SA$Year)),
max(c(avg_happiness_year_SA$Year, avg_suicide_year_SA$Year)), by = 1))This project provided a fascinating opportunity to delve into an unfamiliar topic, applying theoretical methods learned in class to a practical, real-world project. We performed an in-depth exploration using four distinct datasets: Suicide Rate, GDP per Capita, Bankruptcies, and Happiness Index. The years were spanning from 2011 to 2019. Our initial approach involved analyzing each dataset independently to understand its composition and conducting preliminary exploratory data analysis. This included tasks such as calculating and plotting the average annual values for each dataset across the globe.
Following this foundational analysis, we progressed to the data wrangling phase. We successfully merged the four datasets. The resultant combined dataset included data from 19 countries, providing a basis for further analysis. In the analytical phase, we wanted to uncover any potential correlations within the data set of these 19 countries. Our efforts to identify significant correlations among the datasets were unsuccessful.
To gain deeper insights, we narrowed our focus to two specific countries: Germany and South Africa. For each country, we conducted a detailed exploration of their data, plotting trends over the years and searching for any correlations between the variables within each national context. Our analysis did not reveal any significant correlations.
Given the absence of correlations in our initial analyses, we recognized the necessity to widen the dataset to enhance our study. Potential for expanding our dataset include incorporating additional variables such as alcohol and drug usage, unemployment rates, and sunshine hours in the countries studied. Additionally, transitioning from an annual to a monthly data overview might bring some benefit in exploring trends and correlations that were not visible in the yearly data.
Furthermore, to enhance the quality and depth of our analysis, focusing on more granular details could provide significant insights, particularly by examining various socio-economic and political factors that influence country-specific behaviors. Incorporating additional variables such as political stability, social conflicts, and specific cultural constructs could enrich our understanding of the correlations or lack thereof in the data. These factors often have profound impacts on economic conditions, happiness indices, and social issues like suicide rates and bankruptcy, providing a more nuanced and comprehensive framework for analysis.
If given the opportunity to revisit this project from the beginning, we would integrate these broader socio-political variables from the start, allowing for a more thorough initial data collection phase. This approach would enable us to capture a wider spectrum of influences, potentially revealing hidden patterns and correlations that were not evident in our previous analysis. Moreover, employing advanced statistical methods or machine learning techniques could further aid in identifying complex interactions between variables.
Continuing this project, our next steps would involve expanding our dataset to include these additional socio-political factors and applying more sophisticated analytical techniques. This could involve time-series analysis for trend detection or cluster analysis to identify similar behavioral patterns across different countries. By doing so, we aim to build a richer analytical model that can more accurately reflect the intricate realities influencing these critical societal indicators.